Researchers produce thousands of scholarly documents containing valuable technical knowledge. The community faces the laborious task of reading these documents to identify, extract, and synthesize information. To automate information gathering, document-level question answering (QA) offers a flexible framework where human-posed questions can be adapted to extract diverse knowledge. Finetuning QA systems requires access to labeled data (tuples of context, question and answer). However, data curation for document QA is uniquely challenging because the context (i.e. answer evidence passage) needs to be retrieved from potentially long, ill-formatted documents. Existing QA datasets sidestep this challenge by providing short, well-defined contexts that are unrealistic in real-world applications. We present a three-stage document QA approach: (1) text extraction from PDF; (2) evidence retrieval from extracted texts to form well-posed contexts; (3) QA to extract knowledge from contexts to return high-quality answers -- extractive, abstractive, or Boolean. Using QASPER for evaluation, our detect-retrieve-comprehend (DRC) system achieves a +7.19 improvement in Answer-F1 over existing baselines while delivering superior context selection. Our results demonstrate that DRC holds tremendous promise as a flexible framework for practical scientific document QA.
translated by 谷歌翻译
本文向许多受访者调查了同时的偏好和度量学习。一组由$ d $二维功能向量和表格的配对比较``项目$ i $都比item $ j $更可取'的项目。我们的模型共同学习了一个距离指标,该指标表征了人群对项目相似性的一般度量,以及每个用户反映其个人喜好的潜在理想点。该模型具有捕获个人喜好的灵活性,同时享受在人群中摊销的度量学习样本成本。我们首先以无声的,连续的响应设置(即等于项目距离的差异)来研究这个问题,以了解学习的基本限制。接下来,我们建立了嘈杂的预测错误保证,可以从人类受访者那里收集诸如二进制测量值,并显示样品复杂性在基础度量较低时如何提高。最后,我们根据响应分布的假设建立恢复保证。我们在模拟数据和大量用户的颜色偏好判断数据集上演示了模型的性能。
translated by 谷歌翻译
A surprising phenomenon in modern machine learning is the ability of a highly overparameterized model to generalize well (small error on the test data) even when it is trained to memorize the training data (zero error on the training data). This has led to an arms race towards increasingly overparameterized models (c.f., deep learning). In this paper, we study an underexplored hidden cost of overparameterization: the fact that overparameterized models may be more vulnerable to privacy attacks, in particular the membership inference attack that predicts the (potentially sensitive) examples used to train a model. We significantly extend the relatively few empirical results on this problem by theoretically proving for an overparameterized linear regression model in the Gaussian data setting that membership inference vulnerability increases with the number of parameters. Moreover, a range of empirical studies indicates that more complex, nonlinear models exhibit the same behavior. Finally, we extend our analysis towards ridge-regularized linear regression and show in the Gaussian data setting that increased regularization also increases membership inference vulnerability in the overparameterized regime.
translated by 谷歌翻译
我们考虑在可实现的环境中进行交互式学习,并开发一般框架,以处理从最佳ARM识别到主动分类的问题。我们开始调查,即观察到可怕算法\ emph {无法实现可实现的设置中最佳最佳状态。因此,我们设计了新的计算有效的算法,可实现最可实现的设置,该算法与对数因子的最小限制相匹配,并且是通用的,适用于包括内核方法的各种功能类,H {\“O}偏置函数,以及凸起功能。我们的算法的样本复杂性可以在众所周知的数量中量化,如延长的教学尺寸和干草堆维度。然而,与直接基于这些组合量的算法不同,我们的算法是计算效率的。实现计算效率,我们的算法使用Monte Carlo“命令运行”算法来从版本空间中的样本,而不是明确地维护版本空间。我们的方法有两个关键优势。首先,简单,由两个统一,贪婪的算法组成。第二,我们的算法具有能够无缝地利用经常可用和在实践中有用的知识。此外为了我们的新理论结果,我们经验证明我们的算法与高斯过程UCB方法具有竞争力。
translated by 谷歌翻译
级别设置估计问题旨在查找域$ {\ cal x} $的所有点,其中一个未知函数$ f:{\ cal x} \ lightarrow \ mathbb {r} $超过阈值$ \ alpha $ 。估计基于可以在$ {\ cal x} $中顺序和自适应地选择的位置获取的嘈杂函数评估。阈值$ \ alpha $可以是\弹性{显式},并提供先验,或\ \ ich {隐式},相对于最佳函数值定义,即$ \ alpha =(1- \ epsilon)f(x_ \ AST)$关于给定$ \ epsilon> 0 $ why $ f(x_ \ ist)$是最大函数值,并且未知。在这项工作中,我们通过将其与最近的自适应实验设计方法相关联,为近期自适应实验设计方法提供了一种新的再现内核盗窃空间(RKHS)设置。我们假设可以通过RKHS中的函数近似于未知的拼写,并为此设置中隐含和显式案件提供新的算法,具有很强的理论保证。此外,在线性(内核)设置中,我们表明我们的界限几乎是最佳的,即,我们的上限与阈值线性匪徒的现有下限匹配。据我们所知,这项工作提供了第一个实例依赖性非渐近的上限,就匹配信息理论下限的水平设定估计的样本复杂性。
translated by 谷歌翻译
The generalisation performance of a convolutional neural networks (CNN) is majorly predisposed by the quantity, quality, and diversity of the training images. All the training data needs to be annotated in-hand before, in many real-world applications data is easy to acquire but expensive and time-consuming to label. The goal of the Active learning for the task is to draw most informative samples from the unlabeled pool which can used for training after annotation. With total different objective, self-supervised learning which have been gaining meteoric popularity by closing the gap in performance with supervised methods on large computer vision benchmarks. self-supervised learning (SSL) these days have shown to produce low-level representations that are invariant to distortions of the input sample and can encode invariance to artificially created distortions, e.g. rotation, solarization, cropping etc. self-supervised learning (SSL) approaches rely on simpler and more scalable frameworks for learning. In this paper, we unify these two families of approaches from the angle of active learning using self-supervised learning mainfold and propose Deep Active Learning using BarlowTwins(DALBT), an active learning method for all the datasets using combination of classifier trained along with self-supervised loss framework of Barlow Twins to a setting where the model can encode the invariance of artificially created distortions, e.g. rotation, solarization, cropping etc.
translated by 谷歌翻译
For small training set sizes $P$, the generalization error of wide neural networks is well-approximated by the error of an infinite width neural network (NN), either in the kernel or mean-field/feature-learning regime. However, after a critical sample size $P^*$, we empirically find the finite-width network generalization becomes worse than that of the infinite width network. In this work, we empirically study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$. We find that finite-size effects can become relevant for very small dataset sizes on the order of $P^* \sim \sqrt{N}$ for polynomial regression with ReLU networks. We discuss the source of these effects using an argument based on the variance of the NN's final neural tangent kernel (NTK). This transition can be pushed to larger $P$ by enhancing feature learning or by ensemble averaging the networks. We find that the learning curve for regression with the final NTK is an accurate approximation of the NN learning curve. Using this, we provide a toy model which also exhibits $P^* \sim \sqrt{N}$ scaling and has $P$-dependent benefits from feature learning.
translated by 谷歌翻译
The lack of standardization is a prominent issue in magnetic resonance (MR) imaging. This often causes undesired contrast variations due to differences in hardware and acquisition parameters. In recent years, MR harmonization using image synthesis with disentanglement has been proposed to compensate for the undesired contrast variations. Despite the success of existing methods, we argue that three major improvements can be made. First, most existing methods are built upon the assumption that multi-contrast MR images of the same subject share the same anatomy. This assumption is questionable since different MR contrasts are specialized to highlight different anatomical features. Second, these methods often require a fixed set of MR contrasts for training (e.g., both Tw-weighted and T2-weighted images must be available), which limits their applicability. Third, existing methods generally are sensitive to imaging artifacts. In this paper, we present a novel approach, Harmonization with Attention-based Contrast, Anatomy, and Artifact Awareness (HACA3), to address these three issues. We first propose an anatomy fusion module that enables HACA3 to respect the anatomical differences between MR contrasts. HACA3 is also robust to imaging artifacts and can be trained and applied to any set of MR contrasts. Experiments show that HACA3 achieves state-of-the-art performance under multiple image quality metrics. We also demonstrate the applicability of HACA3 on downstream tasks with diverse MR datasets acquired from 21 sites with different field strengths, scanner platforms, and acquisition protocols.
translated by 谷歌翻译
In unstructured environments, robots run the risk of unexpected collisions. How well they react to these events is determined by how transparent they are to collisions. Transparency is affected by structural properties as well as sensing and control architectures. In this paper, we propose the collision reflex metric as a way to formally quantify transparency. It is defined as the total impulse transferred in collision, which determines the collision mitigation capabilities of a closed-loop robotic system taking into account structure, sensing, and control. We analyze the effect of motor scaling, stiffness, and configuration on the collision reflex of a system using an analytical model. Physical experiments using the move-until-touch behavior are conducted to compare the collision reflex of direct-drive and quasi-direct-drive actuators and robotic hands (Schunk WSG-50 and Dexterous DDHand.) For transparent systems, we see a counter-intuitive trend: the impulse may be lower at higher pre-impact velocities.
translated by 谷歌翻译
Media bias can significantly impact the formation and development of opinions and sentiments in a population. It is thus important to study the emergence and development of partisan media and political polarization. However, it is challenging to quantitatively infer the ideological positions of media outlets. In this paper, we present a quantitative framework to infer both political bias and content quality of media outlets from text, and we illustrate this framework with empirical experiments with real-world data. We apply a bidirectional long short-term memory (LSTM) neural network to a data set of more than 1 million tweets to generate a two-dimensional ideological-bias and content-quality measurement for each tweet. We then infer a ``media-bias chart'' of (bias, quality) coordinates for the media outlets by integrating the (bias, quality) measurements of the tweets of the media outlets. We also apply a variety of baseline machine-learning methods, such as a naive-Bayes method and a support-vector machine (SVM), to infer the bias and quality values for each tweet. All of these baseline approaches are based on a bag-of-words approach. We find that the LSTM-network approach has the best performance of the examined methods. Our results illustrate the importance of leveraging word order into machine-learning methods in text analysis.
translated by 谷歌翻译